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ABSTRACT * 

Content, instructional, and curricular validity., as 
related to certification te'sts, are examined. All three deal with 
content validity but the domain differs among the the three. . 
Certification tests must have instructional validity, i.e., the test 
must be valid both with respect to the domain used Ho define the 
minimum competencies and the instructional content domain (what is 
taught ' in the schools) . "iThe test items must be representative of the 
objectives domain but ndf necessarily representative of the 
instructional content doihain. Whether a test has content validity 
with respect to the domain specified by the curricular materials is 
important only insofar ah it is a surrogate for instructional 
validity. Curricular validity should not be used as a criterion to 
establish the instructional validity of a certification test. For 
tests of certification, .a : relatively large percentage of the items 
should represent topics that are covere'd by all students in the 
district and/or state (to assure, that certification tests have 
instructional validity). A prototypic measurement of curricular and 
instructional validity for elementary school mathematics illustrates 
these points. (Author/PN) 
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Abstract -* • 

The authors examine three types* of content validity relative 
to certification tests such as the high school graduation; test used^ 
in Florida. They identify and give * examples of the domains for 
curricular, . instructional, and content validity and make a case for 
the necessity of establishing the overall content validity of a 
test based upon the objectives that underpin the test/before question- 
ing whether that test is also valid with respect to the- curricular 
material used in schools or even more narrowly with respect to the 

J actual instruction provided in the schools." ^he authors describe 

t 

and illustrate a prptotypic measurement of curricular and instructional 
validity for elejgent^ry school mathematics. 
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VALipiTY AS A VAR; ABLE: 
' CAN THfe SAME CERTIFICATION TEST 
BE VALID FOR \lL STUDENTS 1 



William H. Schmidt, Andrew C. Porter, John R> Schwille,. * 
Robert E. Floden, and Donald J. Freeman 2 y 

In the judicial case of Debora P. vs. Turlington, the courts 

addressed- the concept of validity as it pertained to *the Florida 

Functional Literacy examination.' Since the test was to be used in 

certifying a level of functional literacy required for high school - 

graduation, much was at stake. Out of the controversy surrounding 

the examination and its use, two new types of validity emerged, 

curricular validity and instructional validity. The purpose of this 

paper is to explore the meaning o'f these two new types of validity, 

^ < <■ 

to show whece they fit within the psychometri<js tradition, and to 

consider what determines the curricular and/or instructional validity 

of a test . . yf 

Three Types >of Ccfntent Validity 
This conference is concerned not with validity in general, but 
more narrowly with the concept of content validity. The American 
Psychological Association (1974) defines content validity as the 
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situation in which the behaviors measured in a test constitute a 
representative sample. of the behaviors to be exhibited in the desired 
performance domain. However, the case of Debora P/vs, Turlington 
■raises a complication not addressed in this definition. For, while 
the 5 lower court found that tvfk test had reasonable content validity 
with respect to the skill objectives developed by the state board . 
1 of education, the appellate court maintained that th6*£ was an<addi- 
tional question of curnioular validity. The planners of this con- 
ference have ad'ded to the complexity by introducing still another term: 
instructional validity.' What, we are asked, do these terms mean and 
what are their implications for tests of certification? 

Defining the Three Terms ' t 
Most large scale testing programs such as state assessment , 

minimum competency, and certification tests halve used a set of " 

instructional objectives as "the desired, performance domain" against 

which to judge content validity k 

As for curritular validity, the judges seemed to be concerned 

o * 

with using the schools'' curricular materials as the domain against 
which to judge a test. By. the same token instructional validity 
can be defined as content validity with the domain of interest being 
the instructional content actually delivered by teachers in school. 

The term content validity, as used by the trial court in Debora 
P. vs. Turlington, referred to the extent to which the test 
accurately reflected the domain specified for development of the test, 
namely the set of skill objectives defined by state legislation arid 
acted upon by the state board of education. Curricular validity asks 
whether the test, established^as valid with respect to the domain of 

* 



objectives, is also consistent with the curricular materials used in 

* 

the school system wher6 it is to be administered. Similarly, in- 
structional validity is a matter of whether the test, however valid 
with respect to the objectives, adequately samples the instructional 
content actually taught to the students'. In the\discussion which 
follows, we refer to these three domains as the objectives domain, 
the carricular materials domain, ^nd the instructional content domain. 

Relationship of the Three Domains 

.If one' were to. think of each domain as a set, the interrelationship 

among the types of validity can be seen through Venn diagrams such as 

portrayed in Figure 1. If content validity were equated with validity 

far the objectives domain, as in the Florida case, the test must 

adequately sample subsets A, B/E, and'F. However, subset A represents 

' content in* which students taking the test were not instructed, neither 

« 

'was that content included in the materials used by the schools. 




Figure 1. Interrelationships among the types of v^idity. 



If a test has content validity with respect to both objectives 
and instructional content, then it is likely that the relationship 
shown in Figure 2 would obtain. In this case the objectives domain 
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is a proper subset of the instructional content domain . ( This 
arrangement seems reasonable inasmuch as certification tests are 
commonly based on minimal competencies. The scope of the instructional^ 
domain is large, reflecting its lack of restriction to minimal com- 
petencies. However, the domain of curricular materials need not' be 
coincidental with either of the other two domains- Further, since 
different children may receive different content, it is quite conceivable 
tJhat the model of Figure 2 would change across children within the same 
classroom as well as. children in different classrooms. 




Figure 2. , Interrelationships among types of validity. 

If the test has content validity with respect to both objec-* 

- *< 

tives and curricular materials, J then the objectives domain is likely 
a proper subset of the curricular materials domain, suggesting a situ- 
ation similar to the one above in which a test of minimal competencies 
* , is being used . 



Problems in Defining Domains 

One of the problems with defining a domain concerns the^level 
of' detail to be contained in that domain. The domain should be at a 
level fine enough to make distinctions thdt are Hmportant but not 
to a- level of detail so fine as to^classify everything' within the 



subject matter as being different. Thij, of course, is the tnckp 
being knowledgeable about 1) the subject matter and 2) the amount^ 
trah?fer in learning that can occur among the topics contained inp 
the dbmain. For if transfer of learning is straightforward betwee||^ 
two topics (e.g;, instruction on how to add 5 + 3 enables one to M 
correctly "do the problem 4 + 3), then a taxonomy that makes such di|| 
tinctions might be overly detailed. On the other hand, it is obvious 
that at a very high level of generality, most all topics are similar 
(e.g., all -items on a mathematics test deal with mathematics), so 
that moving in any direction too far is similarly not of any great 
value. . 

Another problem in-specifying domaipls is tied more closely to 
the instructional content domain and the curricular materials domain. 
This is the question of topic emphasis: Is it sufficient for a topic 
to be included in the domain if it is covered in the- 'school one time 
on one day or if it is found in one problem in the textbook.* If thi* 
is not the case then what number of hours, days or problems is 
sufficient in order for the, topic to be included in the domain? 

Making the Test* Representative 

Figures 1 and 2 do not address one aspect of the traditional 
definition of content validity, 'homely, that the test be a represen- 
• tative sample of behaviors from the domain. When one considers the 
objectives domain alone; this property seems clearly desirable,. 
Otherwise, one objective (e.g., a computation objective in mathe- 
matics)* might be overemphasized to the detriment of another (e.g., 
an applications objective in mathematics) . However, it is not so 
clear that the original motivation for introducing the terms curric- 
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ular validity and instructional validity are best served by retaining 
the requirement of representativeness. If we consider a test of mini- 
mal competencies, for example, the requirement for representativeness 
could be interpreted to mean that the -content of the^ curriculum' 
materials and the content of instruction must'be limited to minimal 
competencies for the' test to have' curricula^ validity or instructional 
validity* Otherwise, there would be cpntent in the materials and 
content in tjie instruction not represented on the test. To be 
restrictive in this way seems ^undesirable. The concepts of curricular 
" validity and instructional validity serve in the eyes of the court 
and the planners of this conference to provide assurance thai test 
content is also covered in curriculum materials and in class. The 
requirement for representativeness could change the concept of 
curricular and instructional validity from an assurance of sufficient 
coverage to a limitation on coverage. * 

If the requirement for representativeness is dropped, then ill a 

strfifct sense, curricular validity and Instructional validity cannot 

> 

be thought of as specific types of content validity on a 'par with 
objectives validity. Rather, they should be thought of as charac- 
teristics of interest for tests. that have first been judged to be 
valid with respect to the objectives domain (which could once 'again 
be equated with content validity). Since ^alidity is a matter of 

degree rather than a dichotomous state, curricular validity and in- 

* t 

structional^validity would, in practice, need tto be judged directly* 

against ^t^^t^^frat^er^ than against the objectives domain. * 

On the other hand, the merit of requiring representativeness 
as a criterion for curricular or instructional validity is that this 
criterion would guard against a test giving too much weight to topics 
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^ ' — 

that arB^ifllnor-^ trivial aspects of instruction. For example, a 
test ^*mirfimal competencies might be devoted entirely to basic number 

. rr ■ .* 

facts. Would this test be considered to/have curricular or* instruc- • 

tional validity solely on the basis that they were covered in the 

materials or in^the classroom even if" other important aspects pf 

the materials or classroom coverage were entirely neglected? 

Thus, the question of representativeness seems to- revolve around 

the issue of whether curriculum materials and classrodm instruction 

are considered worthy indicators of content, priorities in their ovpi • 

right or,, alternatively whether the objectives domain is considered 

% a sufficient criterion of content priorities, with the curriculum 

w 

i 

^ateriajs and classroom instruction being taken, not as indicators 

^of content. priorities, but. of su*?$tfcient student opportunity to 
. ** * 

learn. There may be no general answer to this question since in 

> • „r- 

part it is dependent upon the extent to which t^e objectives are 

viewed as authoritative. Presumably the greater the overlap among ^ 

> the thrjee domains, the more authoritative each would be viewed- as 

i 

a guide for what should be tested!. 

I * 
A "discussion of our attempts 1 ) to measurer the overlap between 

tests, 'curriculum materials, and classroom instruction, which 

i * 

follows later in this paper, will serve to 'further illustrate these 

' v * 

issues. Given the^ general nature lof stete assessment, minimum^com- 
.petency and certification tests, atad the issues before the courts, 
the answer for these types of tests seems to be the latter (i.e., 
to not require representativeness for a test to have curricular 
and instructional validity). ! * 



12 

JC.' 



What Type of Val idity for What Type of Test? 

* r 5? — — * — 1 « • * r ~* 

For genetai aptitude or general achievement tests we wbuld iljgue 

that the main concern should be content validity with respect to the- 

* \ 
domain *ipori which the test % is to be based. . 

For test§,of certification, it is not enough that a test have 
validity with ^respect to the objectives domain. It should .also have' 
content validity with respect to instructional ^content but not neces- 
sarily in a representative fashion. In other words, some acceptably 
large percentage of the items- sampled from Jfhe objectives' domain 
t must also t> be coveted by every student in every classroom (Figure 2). 
This ,is th£ issufe of sufficient student opportunity. 

If tests without instructional validity are *being" used' for certi- 
fication, the students who fail A such "tests are being penalized for the 
failures of the schools and teachers and not for their own tnade- 
cfuacies. The rati6nal basis for judging student performance in school 
is undermined. 

If a test has instructional validity, curricular validity can 
be argued to have litfle importance, t6 be £jf|terf luous. In fact, 
at -least two arguments can be made for curricular validity: One is 
that curriculum materials can serve to reinforce classroom coverage 
,of all ,the ; *content on the test; the other (to be developed in the 
next sections) is that it is more difficult to measure instructional 
validity* than it is to measure curricular validity. There is the 
possibility of using *curricular validity as a surrogate for instruc- 
tional validity, and it is relatively easier to control curricular 
validity than instructional validity. 



Prototype Measurement or t 
' Curricular and Instructional Validity . 

« 4 

s In this section we set fortfc a system to be considered ft>r use 
in content validation. Suggestions here are the result of work-in. 
elementary school mathematics. It is our hope that some of what we 
haye learned in this context might be generalizable to other subject 
matters and 'to other grade levels as well. 

A Taxonomy for Measuring Content Validity 

it 

In our research on the determinants of content coverage in the 
classroom, we were interested in -developing an instrument that would 
enable us to measure the content of instruction, tests, and curricula 
materials . It isr-pur- proposal 4:hat- such. a^dev±ae_CjOJAJLd^also_be .used 
to establish the content validity of a test with respect to any of 
the domains discussed in this paper. A taxonomy that enables one to 
map the items of a test into their content specifications for fourth- 
grade mathematics could be used to characterize, tfte content domains ' 
represented by that t6st. This taxonomy could also be applied to 
the other domains. For example, the domain specified by the objec- % 
tives on which the test is to be constructed couJLd be analyzed by 
content using .this taxonomy/ Since this ^uld also be done for the 
tests, *a way to establish the^content validity of the tests is to 
determine the degree to which the test item map can be subsumed 
under the objective map. 

This same strategy could be followed with respect to curricular 
a materials. The various curricular materials could similarly be 
analyzed by content using the taxonomy, and a map could be devleoped 
that suggests the range of topics represented in the domain covered 
by the textbook or. other curricular materials. The same thing 
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could be done with respect to the eContent of the 'actual classroom 
instruction. In a later section we address the additional problem of 
how one takes the actual classroom instruction and maps that into the 
taxonomy. 

Description of a Taxonomy for Elementary School Mathematics 

Th<* taxonomy discussed here takes the form of a three-dimensional 
'matrix. The three dimensions are: the general intent of the lesson 
(e.g., conceptual understanding or application), the nature of j'the 
mate-rial presented to students (e.g., measurement or decimals), and 
the operation students must perform (e.g., estimate, or multiply). 
Developed in conjunction with this taxonomy is a set of rules to 
o'perationalize the cell boundaries. The application of the taxonomy 
-to tists and textbook exercises is relatively straightforward, sus- " 
ceptible to being jeplicated, and results ill high inter-rater 
reliability (Freeman et al.,_1981). * «• 

Application of the Taxonomy to Tests % 

Each item on the test is examined and classified according to\ 

the taxonomy. The data from such an analysis can be represented by . 

r \ ' ' 

a marfc on the taxonomy that indicates which _of the cells Yn the tax- 

onomy are covered. After the entire test has been mapped onto the 
taxonomy, the result is a visual representation .of the areas covered 
by that test. This process is illustrated in Figure. 3, whic;h portrays 
the 4 results of the content analysis of the Stanford Achievement Test 
(SAT). It illustrates the flexibility of the taxonomy to, describe 
content at different' levels of detailT Specific topics 'are, represent- 
ed by the cells of the classification matrix (e.g., three of\the 112 
Stanford items focus on £he skill of column addition of multiple 



SAT - Intermediate Level" I 




Nature of tfaterial 

1. sing. dig. /basic facts 

2. sing, ft mult, digit 

3. auHfpW digit 



4. no. sen. /phrase 
5! alg. sen. /phrase 
6. sing. /like frac. 



7. unlike frac. 

8. nixed no. 

9. decimals 



10. percents 

11. MasureM 

12. essn. unl 



13. geometry- 
nt , 14. other ' 
ts of Measurement 



Figure 3. Content analysis of Stanford Achievement Test (Intermediate Level/Grades 4.5-5.6), 1973 
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digit numbers). More .general topics cao also be addressed by summing 
across cells to obtaiii marginal X&tals (e.g., seven of the 112 items 
deal with' column addition). 

Application of the Taxonomy to Curricular Materials 

The use. of the taxonomy to analyze the. content of curricular- 
materials is much more difficult than it is foj^ tests. Lessons in 
textbooks contain two distinct components: instructional activities 
directed by the teacher and practice exercises assigned to students. 
Our analyses of textbooks were limited to items in the student exer- 
cise portion of each lesson. The number of items to be classified 
for the student exercise* portions of the three textbooks that we h^*r£"\> 
worked with range from a low of 4,2#8 items in the Addi son-Wesley 
textbook to a high of 6,968 items in the Houghton-Mifflin text. 
These figures show that the content classification of 
curricular materials such as textbooks is extensive and time 
consuming. 

To illustrate the application of the taxonomy to curricular 

materials, we provide the results from the content analysis of three 

fourth-grade textbooks: Mathematics in Our World , Addison-Wesley 

f 

Publishing Co., 1978; Mathematics , Houghton-Mifflin Co., 1978; and • 
Mathematics Around Us , Scott-Foresman and Co., 1978. 

An analysis of content at the cell level within the tdxonomy 
provides a basis for comparing the treatment of specific topics within 
textbooks (e.g., applications involving the multiplication of single- 
digit numbers). Figure 4 depicts the concentration of items represent- 
ing specific topics within one of the three texts. Four general 
categories are used to ^depict the relative frequency of items irt each 
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Conceptual Understanding 
Operations 1 I 3 4 S 6 7 8 9 10 11 12 13 14 





L = less than O.S\ (<2l Items) 
%1 = 0.5* to 5.0* (21-214 Items) 

H r S\ to \0\ (215-429 Items) 
(fj) = Mo T e than 10* ( > 429 Items) 
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wi th pi cturcs 



Nature of Material, 

1 . sin-}.* dig. /bask facts 

2, sing & mult, digit 
3 *mul tiple digit 



\. no. sen. /phrase 

5 alg. sen. /phrase 

6 sing. /like frac. 



7. unlike frac. 
R. mixed no. 
9. decimals 



10. percents 13. geometry' I 

11. measurement 14. other 

12. essn. units of measurement 



Figure 4 V Distribution of items in the Addison-Wesley fourth-grade text 

# 1S * 



cell of the taxonomy. The symbol "H" denotes high frequency cells ^ 



into which 5% to 10% of the items fall. When the "H? is circled, over 



10% of the items in the text are concentrated in that cell. An M 
designates cells containing a moderate concentration of items (0.5% 
to 5.0%) and M L lf indicates cells containing a low frequency of itemsi 
(less than 0.5%)., Cells that have no symbols are "empty/ 1 meaning 
that content does not occur in the textbook, ' 
Each textbooks-distribution *#f specific topics a^oss the cate- 
gories of concepts, skills, and applications is presented in Table 1. 
From this table it can be seen that of the 293 topics included in one 
or more, of the three books, 51% were included in the Addison-Wesley 
text, 57% in the Houghton-Mifflin text, and 67% in the Scott-Foresman 
text. Beca^^ there was , overlap in topic coverage* (i«e., some topics 
were covered in two or three of the books), the cell frequencies in 
Table 1 sum to more " than 293. Nevertheless', this analysis reveals 
tjiat any given book covers only a little, more than half the topics 
presented »in all three books collectively* 




Table 1> 



Distribution of Specific Topics Across 
Concepts, Skills- and" Application^ 





Addison 


Houghton 


Scott 




Wesley 


Mifflin 


Foresman 


Concepts 


'23 • 


42 


56 


Skills 


53 


52 / ' 


66 


Applications 


72 


73 


75 


Totals 


148 


167' 


197 



1 
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Ttte' Establishment of Cufrricular Validity: Textbooks 

In this section we illustrate the way in which curricular validity 
can be established by examining the map between the several tests 
and textbooks illustrated in the previous section. In order >to put 
this in the context that we have been considering, assume that the 
teat's content validity with respect to the objectives domain has 
already been established and that what is • being considered here is 
the additional question of the degree to which' the* test iias curricular 
validity with respect to the textbook being used in that particular 
district, > 

L > 

For purposes of illustration, data are presented that contract 

^ 

the content dofeains specified by (1) the five most frequently used 
standardized tests* in mathematics ^and^) three* textbooks. With our data 
we can ask what percent of the topics on a test are covered in a given 
textbook. The four columns labeled "T" in Table* 2 describe the percent 
of topics in each test ,that served as the focus of at least one item 
in the student exercises in each book. In interpreting these figures, 
it "is important t;o remember that at least 4,000 items were classified 
for each book. The percent of tested topics covered ii> a given 
book ranged frpm a low of 52,8% for the SAT and Houghton-Mifflin 
text to a high'6f 73,7% for the MAT and Houghton-Mifflin textbook. 
Thus, only about one-half of the topics that were considered in >the_SAT 
SAT were covered by one or more of the 6,986 items in the student 
exercise portions of the Houghton-Mifflin text. 

The columns labeled "T" In Table 2 describe the percent of test 
topics that served as the focus of at least 20 items in each fcook. 
If one assumes that this subset of book topics represents the content 
students will have had an adequate opportunity to learn or to pr^c- 
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Table 2. 

Percent *>f Tested Topics Covered in Each Textbook 



Publisher 



Tests 



MAT (38) 
SAT (72) 
IOWA .(66) 
CTBS I (53) , 
CTBS II (61) 



Addison-Wesley 



(148) u 
63.2 
54.1 
54.5 
56.6 
60.7 



(42) 
31.6 
22.2 
25.8 
32,1 
27.9 



Houghton-Mifflin 
T T* 



(167) 
73.7 
52.8 
72.7 
64.2 
. 59.0 



(49) 
39.5 
20.8 
31.8 
37.7 
37.7 



Scott-Foresman 
T T'» 



(197) 

73.7 

62.5 .. 

71.2 

64.2 

67.2 



(50) 

42.1 

22.2 

25.8* 

35.8 

34.4 



a T = Topics covered by at^least one. item in the book. 
b T'" = Topics covered by at least 20 items in the book.. 

,C Numbers in parentheses indicate the total number of. topics in each test that are 
covered in all three books. . 

lumbers in parentheses across textbooks indicates the number of items in each 
textbook that are covered in" all tests. 
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tice during the preceding academic? year, these figures should provide , 
reasonable estimates of the relation between test content and the content 
of instruction suggested by the book. These values ranged from a low 
of 20.8% for the SAT and Houghton-Mifflin text to a high of 42.1% fdr 
the MAT and Scott-Foresman text. In other wo«rds, the proportion of 

, s* / 

topics presented on a standardized, test that received more than/cursory 
treatment in each textbook was never higher than 50%! 

The Establishment of CurrjxularV^^ (Objectives 

Still ano^Jier exapapl^of^aiT examination of curricular validity is 
presented with resect to the mathematical objectives used inNi district, 
which we call Knoxport. The full strand of^ntathematical objectives., 
excluding those dealing with enrichment, was -subjected to a content 
analysis. This mapping of the objectives was then contrasted with the 
Stanford Achievement Test, which °also happens to be the standardized 
test administered in that district. 

The content specified by the objectives is not totally covered 
on the Stanford Achievement Test, nor, fon that! matter, are the 
topics tested on the Stanford Achievement Test all present in the 
district objectives. There is a fair amount of overlap between the 
two sources but this is in no sense complete. 

One way to suggest this comparison is in terms of the objec-, 

tives used by the school district. Of the total number of objectives, 

/ 

56 percent have content that is tested on the Stanford Achievement^ Test. 
These 52 objectives, however^ do not represent distinct topics as 
defined by the taxonomy. In fact,, the 52 objectives are classified 
into 24 cells of the taxonomy. Another way to think of this lack 
of consistency is to point out that 44 percent cvf the topics covered 
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by the district objectives are*topics -£hat ar^also tested by the 

SAT. Either figure implies that only one-half of the content* covered 

by the objectives is tested by the Stanford Achievement Test. 

Another way to look at the lack of consistency between the two 

is to consider it from the other point of view. The items on the 

SAT represent 61 cells or topics in terms of the taxonomy. Again 

remembering that 24 topics are covered by both ,the district objectives 

and the SAT implies that approximately 40 percent of aU topics * # 

covered by the SAT are also similarly covered in 'terms of the district 

objectives. From this perspective, there is even a greater discrepancy. 

\ . 

The SAT items deal with many topics not Covered by the district' objec- 
tives. 

\ 

Esta blishment of Instructional Validity 

— 7f , — — 

y 

The application of the taxonomy to instructional content is a 

i it — 

much more difficult task. Tests and curricular materials are almost 
always expressed in written form and hence are rather easily sub- 
jected to a content analysis using the taxonomy. The content of in- 
struction, however, is more elusive as it represents an on-going 
process that is presented to the students interactively with the teacher. 
Obtaining data on instruction and detailing the content of tt^at 
instruction is a difficult task. , 

At the Institute for Research on Teaching (IRT) we Have used t 
various forms for the collection of such information. The mo^t costly 
is field observation. In this approach, trained observers record ' „ 
during the course of 5 the day what topics are covered and for what 
periods of time. A cheaper and more straightforward approach to 
the problem is to have teachers keep daily logs in which tfhey record 
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the content of their instruction.^ > 

An importaht question is the degree to which logo kept by teachers 

are an accurate representation of topic coverage. In one project at 

the IRT, we found that teachers in general were able to keep fairly 

accurate logs". An analysis of the measurement error inherent in the 

process is being conducted and preliminary results suggest that in 

♦the aggregate .(that is averaged over days)',^ the amount of error is 

'using the logs to represent content/time allocations of teachers is 

V 

not unacc^ptably /Srarge . # ; , 



V 

Hence, the instructional validity -of a test can be established 



here as was illustrated with respect to curricular validity. The 
only difference being that within this context the content profile 
of the test is contrasted with the content profile derived from the 
log analysis. The* degree to which the two are consistent with each 
other is the extent to which instructional validity is present for 
the test and for the students' in that particular class. None of 
the data from our study ^ere in a complete enough form to provide 
us with an illustration contrasting content coverage against one of 
the standardized tests. We can, however, discuss, in general, 
examples of teachers who used the same materials but whose content 
coverage varied. . 4 

During* the 1979-80 school year we collected extensive data on 
seven teachers in three different districts. We interviewed the 
teachers weekly, observed their classroom. instruction, and had 
them keep daily logs recording their fourth-grade mathematics 
instruction. 

In the Sawyer district we observed two teachers whom we shall 
call Wilma and" Jacqueline. The Sawyer district had a. mandated 

ERIC . . , 2t . 
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mathematics textbook series (Holt) that all teachers were required 

%. .• • ■ 

to us^. .However, teachers were not told they had^to teach all topics' 

from tpje book. In order to place the observations made in this study , 

; $ • * * 

into trig context of, this paper, imagine that this district is con- 

sidering a minimal competency test for promotion from fourth to fifth 

grade and that we are concerned with the "mathematics section (a totally 

, % « 

hypothetical situation) . Let us further assume that the superintendent" 

* t 

and. curriculum director have specified the domain upon which this 
test is to be developed and that, through a careful analysis of the 
Holt text, h^ve d'ecided that the domain specified by the objectives 
is a proper subset of the domain of topics generated by the Holt 
textbook: In /other words the test has cyrricular validity. But 
wo5J2T>sLt Also have instructional validity? One might think that having 
establigjaed curricular validity and also having a standardized 
textbook, so ^s to assure curricuLar validity for all students in 
the district, would assure that all students would receive instruction 
on every topic contained in the domain specified by the objectives. - 
In other words, this would insure instructional validity. However, 
in Sawyer the two teachers we observed treated the textbooks in 
very different ways. For Jacqueline, the textbook essentially defined — 
. for this particular year at least — the content of her instruction. 

She followed the textbook in an almost linear fashion covering it 

/ , ^ 

page by page until she ran but of t^me at'tbe end of the* year (at 

Chapter Nine).* A test such as the* hypothetical promotion test sug- 
gested above would have had instructional validity for Jacqueline's , 
classroom if it contained the same content as the first Jiine chapters 
of the textbook. ' 
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Two caveats need mentioning. Three students in Jacqueline's t 
classroom were put into a 'special subgroup- that used the third grade 
Holt book because these students were below grade level. Obviously 
a mathematics test that matched the fourth grade text would not; 

•have had instructional validity for these students. It Is also inter-*; 

* • •• *■ 1 

k - esting t;o note that if. the material covered in the test was not concen- 
trated at the beginning of the taxtbook but was found .throughout' 
the textbopk then the issue of how far the students* went in. the 
textbook does determine whether the' test would have instructional 
validity. In other words, if the test^examines domain topics covered 
in the back sections of the book then Jacqueline's students> would 
not have been instructed .in them and the Certification test,s would 

not have been instruct ionally valid for those students. . 1 * 

* . ■ > 

• The other teacher in this'district, Wilma, did not follow the 

textbook in any straightforward fashion. In fact, this teacher had 
<jf»er own conception of what should be covered in fourth-grade mathe- 
matics. This conception not yonly included a detailing of the topics 

■ that should be covered but also a time schedule as to when these topics 

■ , • V 

should be covered. As a result of this, VJilma did npt cover the 

- textbook,, She rearranged the order in which she covered things in 

the textbpok; skipped sections of the' book that she did not find 

consistent with her own conception of what should be covered, and 

added to the instruction topics that were not contained in the book; 

In this case, it is clear that although the textbook was man- 

dated, the teacher chose to use it in her own fashion. If any of 

% 

the 'topics that she chose to skip were a part of the domain on which 
the test was based, then, despite the fact that the test had curricular 
validity, it would not totally have had instructional validity for 
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the students in ^ilma's class. In geneial, ouv data show that 
students in this district, using the sam'i mandated textbook in mathe- 
matics received different instructional content. 

• The implication of this \s that the hypothetical promotion \ ■ 
certification test would have had content validity with respect to 
both the objectives and the curricular material for all students in 
the district but for the students, in Jacqueline's class the test 
would additionally* have had better content validity with respect to 
the domain of instructional content than would have been the case 
for -Students in-Wilma's class^ 

Consider one last example!. In Knoxport a detailed Strand of 
objectives for mathematics was* required for^use by all teachers. 
Associated with this set of mathematical objectives was a manage- 
ment system that included locator tests, pretests, and mastery 
tests. Teachers kept records on the oli^ctives that students 

n . 

had passed. In fact, although to the4>est of our knowledge it was 

never invoked, a policy existed whereby teachers could.be released 

from 'their jobs? if they did hot use the MBO system and have the 

students in their class work through the objectives. It is interesting 

that even in this district, with paper sanctions for not following 

» 

the system, we 1 ' found, among the three teachers studied* in this 
district, a lack- of consistency in terms of their students covering 
the objectives, One of the teachers^Andy, almost- tofally followed 
' the MBO system and had his students work' sy sterna tfLcaliy through the 
objectives, one by one, until, they passed the mastery tests. For 
students* in his class any test for advancement made consistent with 
the objectives would have been valid with respect to instructional 
content- at le&st "f or some students^but would have varied sti\dent by 
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student since not all were able t© progress through all the objec- 
tives for that grade level due to self pacing. 

However, in the same district two other' teachers, Terry and 
Lucy, followed less closely the MBO system for their mathematics 
instruction* Lucy provided two mathematics sessions, one devoted to 
regular mathematics instruction and the other, devoted to using the 
individualized objectives system. The other teacher, Terry, rarely 
used the MBO system, and, in fact, by the end of the year, the students 
had spent very little time in the system. Although students in Terry's 

class were from the same district, their testing would not ,neces- 

"* ♦ 

sadl ly have been valid in content even if it were consistent with 
district objectives, . • 

The Three Types of Content Validity 
and Implications for Curriculum Policy Making % j 

Studies we have done indicate Chat if no efforts are made to 
assure curricular or instructional validity,, a test that has content 
validity with respect to the objectives domain would vary in its cur- 
ricular* and instructional validity for different students. Consider * 
for example a test that Has a curricular validity (i,e,, the test 
has content validity both with respect to the domain of objectives 

and with respect to x the domain defined by the curricular materials), 

> 

A test will not have this characteristic unless the materials have 

been standardized for the population being tested, (e,g,, the state 

or district). Otherwise one must talk about curricular validity in 
* 

relationship to some district, school, or building. In this way, 
validity becomes a variable and is not a constant characteristic of 
the test itself as it is in classical test theory and In the case 
of content validity ba_sed on a set of objectives. In general, for 
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curriculum to be valid, the curricular materials must >e Uniform among 
the population for which the test is designed. For example, statewide 
adoptions of textbooks would "assure that a test based oh the objectives 
and consistent with the textbook would' have content validity both with 
respect to the objectives domain and the curricular materials domain. 

- Mar^y educators assume that all basic textbooks in a certain sub- 
ject' matter area cover the same basic content and are in fact inter- 
changeable. This would imply that the test would have curricular 
validity with respect to any one of these textbooks. In the work 
we have done with' three fourth-grade mathematics textbooks' including 
Scott-Foresmari, Houghton-Mifflin, and Addi son-Wesley, we found 
substantial differences among them/ which implies that these books 
are not interchangeable with respecfc to their definition of a 
currLcular domain. One cannot assume" that any book within a certain 
subiect matter will guarantee curricular validity. Once*. the content 
domain with respect to the objectives is specified, careful Analysis 
of the' major textbooks in the field must bei undertaken so as to 
guarantee that the content domain specified by the objectives is in 
fact coincidental or at least a subset of the domain defined by the 
curricular material. ^ ' 

At this point it is reasonable to ask the question, why anyone 
would be particularly concerned ab6Ut a test having fcurricular 
validity? One reason is the belief by many educators that the materials 
do in fact specify the actual instruction to take place in the class- 
room (i.e., by assuring, that a test has curricular validity you are 
also simultaneously assuring instructional validity). 

Also on the practical side, policies that insure curricular 
validity are more easily established than is the case for instructional 
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validity. For example, it is relatively straightforward for a district 
superintendent or state superintendent to mandate textbooks or cur- 
ricular materials to be ifsed in the Schools* within that unit. , This 
is ru>t the case for mandating the actual content of instruction. 
It is also easier to establish whether attest has content validity 
with <respect*"1:o the curricular materials domain. 'The establishment 
ofe instructional validity is much more difficult and time consuming. 

'But we have found that even- when materials are required to be 
used by all teachers within a district or building, this does not 
guarantee that what is in those curricular materials will necessarily 
be covered in every classroom. Many teacher^ operate relatively 
autonomously in defining the content of ifheir instruction, This^ 
we at least found to be the case for fourth-grade mathematics. 
Some teachers follow textbooks and other curricular materials 

f e 

almost to a tee, whereas other teachers in those same districts 
and under the same mandates will not necessarily cover nor follow 
the textbooks. Consider the case of. Jacqueline and Wilma as reported 
previously. It appears to us that verifying the consistency between 
test items and the curricular domain do<*s not insure consistency 
between test items and the' instructional content domain. Since the 

latter is the desired standard, for former would only be useful when 

• ' . ' A 

it could serve as a surrogate for the latter. The research we have 

done certainly challenges the expectation that 'this would occur 
f r equent ly . • 

How could curricular validity serve as a reasonable surrogate 
for instructional validity? If management systems such as the MBO 
system used in Knpxport were to have associated with them stringent 
rewards and punishments that assured that all children will cover 
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the objectives, and if a test has curricular validity with respect 

to the objectives, this system might guarantee instructional validity. 

» 

Why Does Instructional Validity Vary ? 

One might 'ask * why all students in the same classroom do not recreive 
identical instructional content. Two reasons are suggested by our 
research. The first pertains to grouping strategies. If the class 
is always taught as a whole group then all children within that 
classroom will receive the same basic content. Hence for this situ- 
ation a test has instructional validity for all students within the 
classroom. , 

If the instruction within the classroom is provided on a sub- - 
V 

group or individual basis, identical instructional content is not 
necessarily assured across all individuals or subgroups of students. 
This would imply that, even within the same classroom, aNfepst might 

have instructional validity with respect to one subgroup but not 

*• 

with respect to others. 

f 4 

Many of the certification or diploma tests measure cumulative 
types of educational' experiences. A second reason why instructional 
validity is not guaranteed for all students in the 'same classroom 
is that content, assumed to have been -covered in a previous grade 
level and hence nob covered in the present grade, might' not have been 
covered for all individuals (e.g., because of the classroom from 
which they came). So, for some student's, certain content is not 
covered. To the extent that this happens, it exacerbates the problem 

of guaranteeing instructional validity for all children.' 

J 

The point of; this section is that instructional validity will 
not occur naturally. One way to encourage instructional 'validity 
is to require that certain curricular materials, consistent with the 
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domain specified by the objectives, be used and that sanctions be 
included so that teachers are more likely, to cov'er those objectives 
u%ing TheTSs^ctTonal ffi5t¥rte3snpro vid ' ed ' ! — One^lsa_wondexs_if__ 
the long-held notion of teaching to the test might have a positive 
effect in encouraging instructional validity. 

s 3 When the test if first administered (and assuming it is valid 
with respect to the objectives), one cannot necessarily expect the t 
objectives "domain to be a subset of rhe instructional content domain 
for all students unless one puts some constraints on what is taught. 
A reasonable constraint is> to require some ievel of performance on 
the test as ^ criterion for graduation. Requiring that this test 
have instructional validity before it can be used {as some have 
argued) is like a "Catch 22" since instructional validity is only 
likely to occur after such a testing practice has been in place 
for a while. 

If a te$t used for certification is administered .for several 
years prior tb the time it? will actually be used for certification, 
and if decisions and careful content analyses of the objectives 
domain (on which the test is based' at the level of detail suggested 
by our taxonomy) is made available to the teachers, it seems likely 
that teachers would begin to teach to the test,, which would provide 

r 

for greater instructional validity. 

» 

* Another way of insuring instructional validity is to give the 
test initially as a diagnostic device and then give remediation to 
students on the topics they fail. This in fact is pretty much the 
way the New York Regents Competency Test is supposed to work.. The 
test is first given in 9th grade. Students who fail are put in a 
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special help class. They can take the test as many times as needed 
to pass. / 



Summary 

In this paper we have examined the three concepts: content 
validity, instructional validity, and curricular validity. All 

t 

three deal with content validity but the domain differs among the three. 
We" maintain that certification tests must have instructional validity, 

' I 

by which we mean that the test must be valid both with respect to 

4 

the domain .used to define ttie minimum competencies and the instructional 

content domain (i.e.,. what is taught in the schools). We further argue 

that the test items must be representative of the objective? domain 

but not necessarily representative of the instructional content domain. 

« Whether a test has content validity with respect to the domain 

specified by the curricular materials i)s important only insofar as i't * 

is a surrogate for instructional validity. Some might believe that 

an analysis of the curricular materials tells us what content is 

covered 'in tfce schools. Our work suggests that this is far from 

true. Teachers 'in 'the United States generally operate fairly , 

autonomously as decision makers in defining the content of their 

l 

instruction. They are influenced by many sources other than curricular 
materials "such as tests, principals, and' other teachers. It is for 
this reason that curricular validity should not be used as a criterion 
to establish the instructional validity of a certif icatit$Ti test. 
For tests of certification there must be some other way to assure 
-that a relatively large percentage of the items represent topics 
that are covered by all students in^th$ district and/or state (i.e.* 
to assure that certification tests have instructional validity). 
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